quick navigator
Products
Technologies
Development Tools
* Features/Benefits
* White Paper
* What's New -- Revision History
* System Requirements
* Licensing
* How to Get the Compiler
* Compiler Updates
* Technical Support
*Back to Intel Software Performance Products Home
Developer Home Contents Search Feedback Support Intel(r)
Intel C/C++ and FORTRAN Compilers

White Paper

The Intel C/C++ and FORTRAN compiler plug-ins allow software developers to gain superior performance on Intel Architecture processors. The objective of these compilers is to allow software to take advantage of the full potential of Intel Architecture processors. The Intel C/C++ and FORTRAN compiler plug-ins are the first compilers to offer processor-specific optimizations with the introduction of each new Intel processor generation. This allows developers to take immediate advantage of each new processor's improvements. Support of MMX™ technology is a vivid example of the Intel compilers’ evolution in parallel with the latest line of Intel Architecture processors.

Feature Summary

Intel compilers have been developed to be compatible with Microsoft* Visual C++*. To ensure ease of use, the Intel C/C++ and FORTRAN compiler plug-ins are usable as first generation plug-ins to the Microsoft Developer Studio*. This capability combines the power of Intel compilers with the features of Microsoft's Integrated Development Environment (IDE).

The following summarizes the features for each of the compilers:

C/C++ plug-in:

  • Provides MMX™ technology support through the use of C "intrinsics." This allows C developers to take advantage of MMX Technology using C function call syntax rather than having to manually code assembly language statements.
  • Supports in-line assembly language insertions for C developers who have particular performance demands.
  • Offers profile-guided optimizations, which allows the compiler to adjust the flow of the program to achieve optimum performance based on previous executions with the same data set.
  • Provides a "blended" code optimization switch that allows you to generate code with optimal performance for any Intel Architecture processor. Also, for developers with more specific targets, the compilers provide processor-specific optimizations that maximize performance for a specific Intel processor.
  • Provides maximum floating-point instruction throughput by using the full power of the floating-point stack.

FORTRAN plug-in:

  • Supports dynamic COMMON
  • Supports thread-safe code generation for multi-threaded applications

System Requirements

The Intel C/C++ and FORTRAN compilers run under the Microsoft Windows* 95 or Windows NT* operating systems.

Language Support

The Intel C/C++ Compiler is a plug-in to Microsoft Visual C++* version 4.x/5.0, which provides the development and run-time environments plus the MFC* libraries. The Intel FORTRAN Compiler is a plug-in to only the Visual C++ version 4.x IDE.

NOTE: The Intel C/C++ Compiler plug-in (version 2.4) will operate with either Visual C++ 4.x or Visual C++ 5.0. However, some new features of Visual C++ 5.0, such as native COM, are not supported in this version of the Intel C/C++ Compiler.

The following FORTRAN languages and extensions are supported:

  • Full support for ANSI FORTRAN 77 (X3.9-1978) and ISO 1539:1980
  • Many extensions popularized by DEC* (VMS*)

Microsoft Visual C/C++ 4.x/5.0 Compatibility

The Intel C/C++ Compiler is compatible with Microsoft Visual C++ 4.x/5.0 in the following areas:

  • Compilation switches
  • Makefile support
  • In-line assembly language syntax
  • Object module, library, and DLL formats
  • Debug and symbol formats
If you have MSVC++ 4.x/5.0 on your system when you install the Intel C/C++ Compiler, the installation procedure automatically integrates the Intel C/C++ Compiler within the tools menu of the Visual C++ IDE. This gives you the choice of using the Intel C/C++ Compiler to compile the projects that you create in Visual C/C++. Just click on "Tools," then click on "Select Compiler" and the selection window provided by Intel appears.

Application Support

The Intel C/C++ Compiler plug-in is particularly efficient in support of the applications described in the sections that follow.

Graphics / Multimedia Applications

MMX™ technology adds 57 powerful new assembly instructions to the Intel Architecture instruction set which are designed to efficiently manipulate and process video, audio, and graphical data. The Pentium® and Pentium Pro processors with MMX™ technology include these new instructions to enhance performance of multimedia applications. The Intel C/C++ Compiler plug-in supports these new MMX instructions in C/C++ programs by using special compiler intrinsics that are coded using C function call syntax.

The compiler allows you to use C language variables in place of hardware registers, which frees you from managing these registers. The compiler generates the corresponding MMX instructions and reorders them to maximize performance through the Pentium processor’s dual instruction pipeline. In addition, the compiler also handles the loads and stores of the C variables to and from memory. Here is an example of an Intel C Compiler intrinsic and a description of its function:

_m64 _m_pmaddwd (__m64 m1, __m64 m2)

This intrinsic multiplies four 16-bit values in m1 by four 16-bit values in m2 to produce four 32-bit intermediate results, which are then summed by pairs to produce two 32-bit results.

The Intel C/C++ Compiler plug-in also provides a rounding control option, which optimizes floating-point-to-integer conversions. The system default rounding mode is round-to-nearest. Because the C language requires that floating-point-to-integer conversions be truncated, the compiler must generate additional instructions to change the rounding mode to truncation before each floating-point instruction and then change it back afterwards. With the -Qrcd switch you can optimize your code by eliminating the additional overhead of instructions required to change the rounding mode back and forth. This option has no effect on floating-point calculations, but conversions to integer will not conform to C semantics. Graphics applications that use floating-point data as input into their rendering operations can benefit from this type of optimization.

Consider the following example:

int a;
float f;

void func()
{
a = f;
}

The following is the standard code generation that would take place:

fldDWORD PTR _f[0+eax*4]
fnstcw[esp+24]
movDWORD PTR [esp+20], eax
moveax, DWORD PTR [esp+24]
oreax, 3072
movDWORD PTR [esp+16], eax
moveax, DWORD PTR [esp+20]
fldcw[esp+16]
fistpDWORD PTR _a[0+eax*4]
fldcw[esp+24]

Notice that it takes ten instructions to complete this function. Using the rounding control option -Qrcd, the optimized code looks like this:

fldDWORD PTR _f[0+eax*4]
fistpDWORD PTR _a[0+eax*4]

You can see that it takes only two instructions to complete the function. This has reduced the total number of instructions by 80%.

Scientific / Engineering Applications

The Intel C/C++ Compiler plug-in provides analysis for interprocedural optimizations to assist you with programs that contain many small or medium-sized frequently used functions, especially for programs that contain calls within loops. Potential optimizations around calling points are normally inhibited due to a lack of information about what happens in the called procedure. Interprocedural analysis examines the relationship between calling and called procedures and enables the following optimizations:

  • Function inlining
  • Passing arguments in registers
  • Interprocedural constant propagation

In addition, the Intel C/C++ Compiler plug-in exploits the use of the floating-point (FP) stack by implementing code generation optimizations that allow FP instructions to execute more efficiently. Most floating-point operations require that one operand and the result use the top of stack. This makes each FP instruction dependent on the previous one and inhibits overlapping the instructions. The compiler breaks this dependency by allowing a program to arrange for one of the inputs for the next operation to always be at the top of stack. It provides this capability by effective use of the FXCH instruction, which comes at almost no additional cost on the Pentium® Pro processor.

Consider the following expression:

a = ((b + c) * b) + ((d + e) * d);.

This expression can be presented graphically as follows:

Serial Instruction SequenceParallel Instruction Sequence

The serial instruction sequence depicts instructions executed one at a time with no overlapping because of the top-of-stack dependency. The parallel instruction sequence uses the FXCH instruction that provides the following gains:

  • Overlapping instructions that can put their calculation results in any stack register, not necessarily to the top of the stack, but different stack registers
  • More parallelism achieved

Database Server Applications

The Intel C/C++ Compiler plug-in has been proven to assist large database applications through combination of interprocedural analysis and profile-guided optimization. Profile-guided optimization provides detailed information on program execution. Therefore, you can optimize performance-critical areas of large applications where the execution time is mostly spent. Profile-guided optimizations can help eliminate instruction cache thrashing by reorganizing code layout, shrinking code size, and reducing branch mispredictions.

Information collected during program execution can be fed back into the compiler to allow a higher degree of optimization. For example, profile-guided optimization might find that a particular section of code is rarely executed. This code would then be moved to the end of the module resulting in the processor fetching instructions more efficiently. The following are the three phases of profile-guided optimization that, when completed, provide the data that can significantly improve the performance of large applications:

Phase 1: Instrumentation CompilationThe compiler inserts code into your program to produce profile information. The resulting code is said to be instrumented by the compiler.

Phase 2: Instrumented ExecutionWhen you execute the instrumented program, it creates a dynamic information file. This file contains data that represents the actual behavior of the program during execution.

Phase 3: Feedback CompilationWhen you compile your program a second time, the compiler uses the data in the dynamic information file to help optimize your program. This data helps the compiler determine the most heavily traveled paths through the program and optimizes along these paths. You can use additional optimization switches during this phase so that other compilation optimization routines can also benefit from the dynamic information.

Application Optimizations Summary

The following table summarizes the optimizations that the compiler applies to your program for each optimization switch. The entry "any" in the Option column means that the compiler automatically performs this optimization, even when optimizations are disabled.

Optimization Affected Aspect of the Program Option
optimized code selection instruction selection / addressing modes any
global register allocation register use -O1 / -O2
instruction scheduling instruction reordering -O1 / -O2
register variable detection register use -O1 / -O2
common subexpression elimination constants and expression evaluation -O1 / -O2
dead-code elimination instruction sequencing -O1 / -O2
variable renaming register use -O1 / -O2
loop-invariant code movement instruction sequencing -O1 / -O2
copy propagation constants and expression evaluation -O1 / -O2
constant propagation constants and expression evaluation -O1 / -O2
strength reduction/induction variable simplification instruction selection/sequencing constants and expression evaluation -O1 / -O2
tail recursion elimination calls, further optimization -O1 / -O2
in-line function expansion calls, jumps, branches, and loops -Qip / -Qipo
interprocedural constant propagation arguments, global variables, and return values -Qip / -Qipo
passing arguments in registers calls, register usage -Qip / -Qipo
monitoring module-level static variables further optimizations, loop invariant code -Qip / -Qipo
multifile optimization affects the same aspects as -Qip, but across multiple files -Qipo

Future Enhancements

The following list summarizes the enhancements expected to be added to forthcoming releases of the Intel compiler products:

  • Support for FORTRAN 90 / FORTRAN 95 / MIL-STD 1793
  • Many extensions popularized by Cray*, IBM*, Sun*, and Microsoft*
  • Multi-threading support (including SGI*-compatible SMP directives)
  • Improved optimizer that requires less memory and runs faster
  • Automatic MMX ™ technology code generation for vector operations
  • Global pointer tracking for improved alias detection
  • Improved dependence analysis for threading and loop transformations
  • Optimizations in presence of exception handling
  • Enhanced code and data layout optimizations to improve cache efficiency
  • Code coverage tool with a Graphical User Interface (GUI)
  • Interprocedural pointer analysis that includes knowledge of library functions

Conclusion

Intel is dedicated to providing a suite of software performance products to assist developers with creating the most powerful applications that run on Intel Architecture processors. The Intel C/C++ and FORTRAN compilers make up a part of this suite, and as Intel’s microprocessor technology evolves, our advanced compiler technology will be right there alongside our newest high-performance processors to let you benefit from every performance gain.


* Legal Information © 1998 Intel Corporation